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Missing  Responses  and  IRT  Ability  Estimation: 

Omits,  Choice,  Time  Limits,  and  Adaptive  Testing 

Abstract 

The  basic  equations  of  item  response  theory  (IRT)  provide  a  foundation  for 
inferring  examinees’  abilities  from  responses  to  different  test  items.  In  practice,  examinees 
do  not  generally  provide  a  response  to  all  items — ^for  reasons  that  may  or  may  not  have 
been  intended  by  the  test  administrator,  and  that  may  or  may  not  be  related  to  their  ability. 
The  mechanisms  that  produce  missingness  must  be  taken  into  account  if  correct  inferences 
are  to  be  drawn.  Using  concepts  introduced  by  Rubin  (1976),  we  discuss  the  implications 
for  Bayesian  and  direct  likelihood  ability  parameter  estimation  that  are  entailed  by  alternate 
test  forms,  targeted  testing,  adaptive  testing,  time  limits,  omitted  responses,  and  examinee 
choice  of  tasks.  Attention  is  focused  on  whether,  in  each  case,  the  mechanism  for 
missingness  is  “ignorable,”  and,  in  those  cases  in  which  it  is  not,  how  it  can  be  modeled. 

Key  words:  Adaptive  testing;  choice;  customized  tests;  item  response  theory;  missing 
data;  omitted  responses;  targeted  testing 


1.0  Introduction 


Item  response  theory  (IRT)  models  the  probability  of  an  examinee’s  responses  to 
test  items  as  conditionally  independent,  given  an  unobservable  ability  parameter  G  (Lord, 
1980).  The  oft-cited  capacity  of  IRT  for  measuring  different  examinees  with  different  test 
items  implies  inference  in  the  presence  of  missing  data,  since  an  examinee  may  not  have 
provided  a  response  to  every  item  in  the  item  domain  of  interest.  The  following  types  of 
missingness  are  in  fact  routinely  encountered  in  applications  of  IRT: 

•  Alternate  test  forms.  Two  or  more  tests  with  similar  content  but  different  items  are 
often  employed  to  minimize  carry-over  effects,  reduce  fatigue  and  practice  effects, 
or  avoid  cheating.  An  examinee  is  administered  one  form  selected  at  random. 

•  Targeted  testing.  Tests  pitched  at  different  levels  of  difficulty  make  measurement 
more  efficient  when  background  information  related  to  ability,  such  as  grade  or 
courses  taken,  can  be  used  to  determine  which  test  to  administer  to  each  examinee. 

•  Adaptive  testing.  Testing  can  also  be  made  more  efficient  if  each  item  presented  to 
an  examinee  is  selected  in  light  of  responses  thus  far. 

•  Not-reached  items.  Under  typical  testing  conditions,  some  examinees  will  not 
reach  the  last  items  on  a  test  because  of  the  time  limit. 

•  Omitted  items.  Even  when  an  item  has  been  presented  and  an  examinee  has  time  to 
consider  it,  the  examinee  will  sometimes  choose  not  to  respond. 

•  Examinee  choice.  Examinees  may  be  allowed  to  examine  a  number  of  items,  and 
choose  which  to  answer,  subject  to  specified  constraints  (e.g.,  “Answer  any  two  of 
the  following  four  questions”). 

When  incomplete  data  are  encountered,  the  IRT  model  that  determines  responses  is 
embedded  in  a  more  encompassing  model  that  determines  which  responses  will  be 
observed  and  which  will  be  missing.  This  paper  discusses  the  implications  that  missing 
responses  hold  for  direct  likelihood  and  Bayesian  inferences  about  examinee  ability 
parameters,  assuming  item  parameters  are  known.  When  can  the  process  that  causes 
missingness  be  ignored?  When  it  cannot  be  ignored,  how  can  it  be  modeled?  How  can 
conventional  IRT  methods  for  missing  responses  be  evaluated  in  this  framework?  Section 
2  extends  IRT  notation  to  handle  missingness,  using  concepts  and  notation  from  Little  and 
Rubin  (1987)  and  Rubin  (1976).  Next,  Rubin’s  (1976)  conditions  for  when  the 
missingness  process  can  be  ignored  are  reviewed.  Sections  3-8  address  the  six  types  of 
missingness  listed  above.  Section  9  is  a  non-technical  summary  of  the  main  results. 
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2.0  Background 

Section  2.1  gives  notation  for  IRT.  Section  2.2  extends  IRT  to  missing  data 
situations  using  concepts  and  notation  from  Rubin  (1976)  and  Little  and  Rubin  (1987),  and 
Section  2.3  lists  Rubin’s  results  on  ignorability. 

2.1  Notation  for  IRT 

Definition.  An  IRT  model  with  examinee  parameter  6  is  said  to  satisfy  local 
independence  (LI)  in  a  domain  of  n  items  if 

Prob(f/i  =u^,...,U„=u^\e,pi,...,p„,y)  =  fl?roh(Uj=Uj\d,Pj), 

;=i 

or,  written  more  compactly, 

PToh{U  =  u\e,p,y)  =  fff{u)=YlfQ(ujl  (2.1) 

;=i 

where  Uj  is  the  response  variable  for  Item  j,  Uj  represents  a  value  thereof,  and  u  = 

(mj  , . . . ,  ) ;  fff(j  is  the  response  function,  interpreted  as  applying  to  individual  items  or 

sets  of  items  in  accordance  with  its  arguments;  J3j  is  a  possibly  vector-valued  parameter 
characterizing  the  dependency  of  response  probabilities  to  Item  J  on  0,  and  J3  =  (j8i,...,/3„); 
and  y  denotes  covariate  information  about  examinees,  such  as  age  or  courses  taken. 

It  will  be  seen  below  that  results  for  alternate  test  forms,  targeted  testing,  adaptive 
testing,  and  not-reached  items  can  be  obtained  without  further  specification  of  models  in 
addition  to  the  IRT  model.  Results  for  omitted  items  and  choice  items,  however,  require 
speculations  about,  and  modeling  of,  the  examinees’  perspective  on  the  missingness 
process.  Sections  dealing  with  these  cases  focus  on  a  class  of  IRT  models  that  is  common 
in  educational  testing,  namely  those  satisfying  local  independence,  unidimensionality,  and 
monotonicity  (Holland  &  Rosenbaum,  1986).  We  adapt  an  acronym  from  Zwick  (1990): 

DEFINmON.  An  IRT  model  is  said  to  be  SMURFLI^^)  if  it  satisfies  the  following 
conditions; 

•  Strict  Monotonicity;  i.e.,  6'  >  6"  =>  Prob^C/^-  =  l|0')  >  Prob(l7y  =  l|0"). 

•  Unidimensional  Response  Functions;  i.e.,  the  domain  of  0  is  91^ 

•  Local  Independence. 

•  2  possible  responses,  correct  or  incorrect,  can  be  observed  for  each  item; 
specifically,  Uj=l  indicates  a  correct  answer  and  0  an  incorrect  answer. 
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Under  the  Rasch  model  for  dichotomous  items,  for  example,  6  and  are  real  numbers, 
and  f0(uj)  =  ex]p^Uj[e-pj)y[\  +  exp(e-pj)j  (Figure  1). 

[Figure  1  about  here] 

If  there  is  no  possibility  of  missing  responses,  (2.1)  is  interpreted  as  a  likelihood 
function,  say  L(6\u),  once  a  particular  value  m  of  t/  has  been  observed.  Direct  likelihood 

inferences  are  based  solely  on  relative  values  of  L  at  different  values  of  6.  One  might  say, 
for  example,  that  the  probability  of  m  at  6'  is  twice  that  at  d" ,  or  that  it  attains  its 
maximum  at  6 ,  the  maximum  likelihood  estimate  (MLE).  Under  direct  likelihood 
inference,  the  MLE  is  interpreted  only  as  a  feature  of  the  likelihood  induced  by  the  data  that 
were  actually  observed,  not  as  a  realized  value  of  an  estimator  with  a  reference  distribution 
concerning  repeated  samples  of  U  with  a  fixed  “true”  6 .  This  presentation  does  not  focus 
on  sampling  distribution  inferences,  although  some  remarks  will  be  made  in  passing. 
Bayesian  inferences  are  based  on  the  posterior  distribution  for  6  given  u ,  or 

p{d\u)  =  K{u)L{6\u)p[e),  (2.2) 

where  K{u)  is  a  normalizing  constant  and  p{d)  is  the  prior  distribution  for  d .  The  first 
panel  of  Figure  2  shows  the  L[d\u)  that  corresponds  to  m  =  (1,0)  for  the  two  items  in 
Figure  1;  the  second  panel  is  the  p[6\u)  that  results  from  L[d\u)  when  p{0)=N{O,\). 

[Figure  2  about  here] 

2.2  Notation  for  Missing  Responses 

Suppose  that  an  examinee  provides  responses  to  only  a  subset  of  the  items.  The 
data  thus  consist  of  (i)  the  identification  of  the  items  to  which  responses  are  observed  and 
(ii)  the  responses  to  those  items.  We  consider  inference  about  6  from  this  extended 
observation,  assuming  the  IRT  model  and  item  parameters  are  known,  adapting  notation 
and  terminology  from  Little  and  Rubin  (1987)  and  Rubin  (1976): 

•  U  =  (f/i,...,f/„)  is  the  (possibly  hypothetical)  random  vector  of  responses  to  all 
items. 

•  M=  (Mj,...,M„)  is  a  “missing-data  indicator,”  with  each  element  taking  a  value  of 
0  or  1.  If  mj=\,  the  value  of  Uj  is  observed;  if  mf=0,  it  is  missing. 

•  y  =  conveys  the  data  that  are  actually  observed:  Vj  =  Uj  if  mj  =  1  but 

Vy  =  *  if  ifij  =  0,  where  *  indicates  that  the  value  of  Uj  is  missing. 
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An  realized  value  of  M,  say  in,  effects  a  partition  of  U,  u,  V,  and  v  according  to 
which  elements  are  observed  and  which  are  missing.  We  write  U  =  (t/mis.f^obs)  ^ 
distinguish  the  missing  and  observed  elements  of  C/;  similarly,  u  =  iumis^^ohs)-  As  with  u 
and  .m ,  v  denotes  a  realized  value  of  V.  Note  that  v  is  inferentially  equivalent  to  (Wobs>^)5 
and  that  by  (2.1),  fe{u)  =  /e(Mniis)/e(«obs)- 

Inferences  about  6  must  be  based  on  the  data  that  are  actually  observed,  or  V  = 
(f/obs’^-  Modeling  the  hypothetical  complete  data  vector  {UM),  even  if  there  is  no 
intention  of  observing  a  response  to  every  item,  forces  us  to  explicate  our  beliefs  about  the 
relationships  among  ability,  item  response,  and  missingness.  To  this  end,  define 
g^{m\u)  =  Prob(M  =  ml  (/  =  m,  0) ,  with  ^  the  possibly  vector-valued  parameter  of  the 

missingness  process  (which  may  include  6  itself).  As  we  shall  see,  the  form  of  g  will 
depend  on  the  process,  and  elements  of  (p  can  characterize  the  examinee,  the  testing 
situation,  or  both.  So  defined, 

Prob((7  =  u,M  =  m\d,  (p)  =  Prob(l7  =  u\d,  0)Prob(M  =  m\U  =  u,  6,  <p) 

=  Prob(l7  =  M|0)Prob(M  =  m\U  =  u,  6,  <p) 

Whenever  all  potential  responses  may  not  be  observed  for  any  reason,  even  if  they 
all  do  turn  out  to  be  observed,  the  data  are  v.  The  likelihood  function  is  obtained  as 

0lv)  —  Jy^0(^^s’^obs)5^0(^^**inis’^obs)^^mis’  (2.3) 

with  taking  the  value  1  if  a  value  {6,  <p)  is  in  the  parameter  space  Q.q^  and  0  if  not. 

The  realized  values  of  observed  responses  M^bs  are  constant  in  (2.3),  and  marginalization  is 
over  the  unknown  values  of  the  unobserved  responses  The  observed-data  likelihood 

is  thus  a  weighted  average  over  all  complete-data  likelihoods  for  full  response  vectors  u  that 
are  in  accord  with  the  observed  responses  to  the  observed  items  Mobs  •  The  weights  are 

proportional  to  the  probabilities  of  these  potential  response  patterns  for  the  different  values 
“mis’  in  and  Mgbs  fixed.  In  the  context  of  IRT,  the  probability  for  the  observed 

responses  can  be  factored  out  and  brought  outside  the  integral,  so  that 

—  ^0(j)f 0(“obs)Jj^6(“mis)^^(^^“inis’“obs)^“inis'  (2.4) 

Appropriate  likelihood  inferences  are  based  on  relative  values  of  L(6, 01  v)  at 
various  values  of  {6,(p).  Bayesian  inferences  are  based  on  the  posterior  distribution 

p{e,(p\v)=K{v)  L{e,(p\v)p{e,(p), 


(2.5) 
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with  p{  6, 0)  the  prior  distribution  for  ( 0,  ^) .  In  general,  the  correct  likelihood  function  for 
6  under  IRT  with  missing  responses  involves  a  nuisance  parameter  and  depends  not  on 
just  the  responses  that  were  observed,  through  /^(Mobs)’  responses  that 

were  not  observed,  through  /^(Minis)  ^(^(^l“mis’“obs)  under  the  integral  in  (2.4). 

2.3  Conditions  for  Ignorability 

Ignoring  the  missingness  process  when  drawing  inferences  about  6  means  that 
instead  of  using  the  correct  likelihood  L(0,0lv),  using  a  facsimile  of  (2.1)  with  alone, 

L  (^liiobs)  ~  ^efei^obs)-  (2.6) 

Direct  likelihood  inferences  about  6  that  ignore  the  missingness  process  simply  compare 
values  of  L*  at  various  values  of  6,  and  Bayesian  inferences  that  ignore  the  missingness 
process  proceed  from  a  facsimile  of  (2.2)  obtained  as 

)?(«).  (2-7) 

It  is  a  pleasant  state  of  affairs  when  the  missingness  process  can  be  ignored,  since 
(2.6)  and  (2.7)  don’t  require  the  specification  of  g,  and  standard  computing  algorithms  can 
be  used.  Depending  on  why  the  missing  responses  were  missing,  however,  these 
procedures  need  not  lead  to  the  correct  inferences.  Rubin  (1976)  specifies  conditions  under 
which  a  missingness  process  can  be  ignored  under  sampling  distribution,  direct  likelihood, 
and  Bayesian  inference.  Useful  sufficient  conditions  for  ignorability  under  direct 
likelihood  and  Bayesian  inference,  the  focus  of  this  paper,  involve  the  following  concepts: 

Definition.  Missing  responses  are  missing  completely  at  random  (MCAR)  if  for 
each  value  of  (j)  and  for  each  fixed  value  m,  g(j,(m\u)  takes  the  same  value  for  all  u.  That 

is,  g^imlu)=g^(m). 

Definition.  Missing  responses  are  missing  at  random  (MAR)  if  for  each  value  of  (p 
and  for  all  fixed  values  m  and  takes  the  same  value  for  all  u^.  That 

is,  g0(mlM)=g^(mlMobs)- 

Definition.  The  parameter  6  is  distinct  (D)  from  (p  if  their  joint  parameter  space  factors 
into  a  0 -space  and  a  ^ -space,  and  when  prior  distributions  are  specified  for  6  and  <p, 
they  are  independent. 

Remarks,  (a)  MCAR  implies  MAR.  (b)  For  direct-likelihood  distinctness  to  be  satisfied, 
conditioning  on  ^  must  not  change  the  support  of  the  likelihood  function  for  9 .  For 
Bayesian  distinctness,  conditioning  on  (p  must  also  not  change  belief  about  6.  (c)  Taken 
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together,  MCAR  and  D  imply  that  the  values  of  both  the  observed  and  the  missing 
responses  are  independent  of  the  pattern  of  missingness.  MAR  and  D  together  imply  that 
the  values  of  the  missing  responses  are  independent  of  the  pattern  of  missingness, 
conditional  on  the  values  of  the  observed  responses. 

We  are  now  in  a  position  to  summarize  Rubin’s  results  for  direct-likelihood  and 
Bayesian  inferences.  First,  a  more  easily  verified  sufficient  condition: 

Theorem  2.1  (Rubin,  1976,  pp.  581).  When  making  direct-likelihood  or  Bayesian 
inferences  about  0,  it  is  appropriate  to  ignore  the  process  that  causes  missing  data  if  MAR 
and  D  are  satisfied. 

Proof.  When  MAR  is  satisfied,  g  does  not  depend  on  and  can  be  brought  out  of  the 
integral  in  (2.3),  which  then  simply  integrates  to  one.  If  D  is  satisfied  as  well,  L{6,<j>\v) 
depends  on  0  only  through □ 

Under  weaker  conditions,  the  integral  need  not  drop  out  but  its  value  does  not 
depend  on  6.  Necessary  and  sufficient  (NS)  conditions  are  given  below,  without  proofs. 

Theorem  2.2  (Rubin,  1976,  pp.  586).  Suppose  L{d,(j)\v)  >  0  for  all  de  Qq  All 

likelihood  ratios  for  6  ignoring  the  process  that  causes  missing  data  are  correct  for  all 
0  €  ^  if  and  only  if  (z)  and  (b)  for  each  0  e  ,  the  quantity 

(2.8) 

takes  the  same  positive  value  for  all  9. 

Remarks.  Theorem  2.2  says  that  for  direct-likelihood  ignorability  to  hold  in  the  IRT 
context,  given  any  value  of  the  missingness  parameter  and  the  observed  data  v ,  the 
probability  of  the  observed  pattern  of  missingness  must  be  the  same  for  all  values  of  6 . 

This  is  true  if  MAR  and  D  hold,  since  these  constitute  sufficient  conditions  for  ignorability. 

If  D  holds  but  MAR  does  not,  the  varying  values  of  ^^(^iMinis’^obs)  under  the  integral  in 
(2.3)  must  be  exactly  counterbalanced  by  varying  values  of  /^(Mniis’^obs)  •  While  it  is 
straightforward  to  construct  artificial  examples  in  which  this  happens,  it  appears  rare  in 
practice  to  find  applications  in  which  the  conditions  in  Theorem  2.2  are  satisfied  but  MAR 
is  not.  Section  7  below,  for  example,  discusses  the  counter-intuitive  circumstances  that 
would  have  to  hold  if  intentional  omitting  were  to  meet  this  condition. 


Missing  responses  and  IRT 
Page? 


Theorem  2.3  (Rubin,  1976,  pp.  587).  The  posterior  distribution  of  6  ignoring  the 
process  that  causes  missing  data  equals  the  correct  posterior  distribution  of  6  if  and  only  if 

takes  a  constant  positive  value. 

Remark.  Theorem  2.3  says  that  for  Bayesian  ignorability  to  hold  in  the  IRT  context,  then 
given  the  observed  data  v  the  probability  of  the  observed  pattern  of  missingness  must  be 
the  same  for  all  values  of  0. 

3.0  Alternate  Test  Forms 

“Alternate  test  forms”  are  sets  of  items  that  all  fit  the  same  IRT  model,  and  the  test 
administrator  is  indifferent  as  to  which  form  an  examinee  is  presented.  The  item  sets  on 
different  forms  may  overlap.  The  form  an  examinee  receives  depends  on  a  random  process 
specified  by  the  administrator,  such  as  a  coin  flip  or  a  form-spiraling  scheme.  In  practice, 
IRT  inferences  about  6  from  alternate  test  forms  are  commonly  based  on  L*  (0lMobs)  • 

The  use  of  K  alternate  test  forms  implies  that  only  K  missingness  patterns,  say 
can  occur,  where  all  the  item-level  elements  of  pattern 

j  are  zero  except  those  which  correspond  to  the  items  that  appear  in 

Formfe.  Denote  by  0;^  the  administrator-determined  values  Prob(M=/n^*^).  Assuming  an 
LI  (locally  independent)  IRT  model  means  that  /©(«)  is  as  given  in  (2.1);  that  is,  the 
values  of  item  responses  are  governed  by  0  alone,  regardless  of  which  items  would  be 
administered.  Even  though  the  items  of  only  one  form  will  actually  be  presented,  it  is 
possible  to  express  our  assumptions  about  the  connection  between  the  (hypothetical)  values 
of  the  complete  response  pattern  and  the  probability  of  the  missingness  pattern  as  follows: 

=  forall«ifm  =  m<‘'  p  „ 

''  |0  otherwise. 

Theorem  3.1.  Random  assignment  of  alternate  test  forms  satisfies  MCAR,  and  therefore 
MAR  as  well. 

Proof.  This  follows  immediately  from  (3.1),  since  the  values  of  g  do  not  depend  on  u.  □ 
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Theorem  3.2.  The  missingness  induced  by  random  administration  of  alternate  test  forms 
is  ignorable  under  direct  likelihood  and  Bayesian  inference. 

Proof.  By  Theorem  3.1,  MAR  is  satisfied.  Verifying  D  for  likelihood  inference  requires 
that  the  9  and  (p  parameter  spaces  are  distinct;  this  follows  from  (3.1).  Verifying  D  for 
Bayesian  inference  requires  that  prior  beliefs  about  9  and  <j>  be  independent;  this  also 
follows  from  (3.1).  Ignorability  follows  from  Theorem  2.1.  □ 

4.0  Targeted  Testing 

“Targeted  testing”  involves  multiple  test  forms  in  which  the  distributions  of  item 
difficulty  differ  purposefully  from  form  to  form.  Exploiting  the  facts  that  (1)  estimates  of 
9  are  more  precise  when  an  examinee  is  administered  items  with  difficulties  near  the  value 
of  9 ,  and  (2)  covariates  y  that  are  related  to  9  may  be  available,  targeted  testing  uses  an 
examinee’s  covariate  to  select  a  test  form  that  is  likely  to  be  more  informative  than  other 
otherwise  similar  forms.  For  example,  an  easy  form  and  a  hard  form  might  be  constructed 
from  a  set  of  n  items  calibrated  together  under  the  same  IRT  model,  and  the  easy  form 
administered  to  first  graders  and  the  hard  form  to  second  graders. 

As  with  alternate  test  forms,  the  existence  of  K  forms  for  targeted  testing  implies 
that  only  K patterns  of  M,  again  denoted  j,  can  be  realized.  As  with 

alternate  test  forms,  the  missingness  parameter  ^  is  the  vector  of  indicator  variables  (p/^ 
indicating  the  test  form  selected  for  the  examinee.  The  parameter  of  the  missingness 
process  now  consists  of  the  administrator-determined  values  <Pk{y)  =  Prob^M  = 

which  indicate  the  probability  that  an  examinee  with  covariate  y  will  be  administered  Form 
k.  For  at  least  one  k  and  two  values  y'  and  y",  (pj^{y')  <Pk{y")-  This  happens  when  the 

test  administrator  knows  p{9\y')  ^  p{9\y')  and  that  the  difficulty  of  Form  k  is  better  suited 
to  the  typical  examinee  with  Y=y'  than  one  with  Y=y" ,  or  vice  versa. 

Theorem  4.1.  Targeted  assignment  of  test  forms  based  on  examinee  covariates  satisfies 
MCAR,  and  therefore  MAR  as  well. 

Proof.  Values  of  g  do  not  depend  on  «.  D 

Theorem  4.2.  The  missingness  induced  by  administration  of  test  forms  based  on 
examinee  covariates  y  is  ignorable  under  direct  likelihood  inference  if  all  values  of  9  can 
occur  at  all  values  of  y. 


Missing  responses  and  IRT 
Page  9 


Proof.  By  Theorem  4.1,  MAR  is  satisfied.  Verifying  D  for  likelihood  inference  requires 
that  6  is  distinct  from  (j) ;  that  is,  conditioning  on  (j)  must  not  change  the  support  of  the 
likelihood  function  for  6 .  This  condition  fails  under  targeted  testing  only  if,  for  some 
assignments  of  test  forms,  certain  values  of  6  cannot  occur.  f 


Theorem  4.3.  The  missingness  induced  by  administration  of  test  forms  based  on 
examinee  covariates  y  is  ignorable  under  Bayesian  inference  only  if  6  and  y  are 
independent  (i.e.,  the  targeting  is  wholly  ineffective). 


Proof.  Under  the  targeted-testing  missingness  mechanism,  g  does  not  depend  on  u  and  the 
missingness  parameter  (p  is  fixed  a  priori  by  the  test  administrator,  so  (2.9)  simplifies  to 
|g0(m)|m,  0|,  or  the  probability  of  a  given  missingness  pattern  given  6 .  By  Theorem  2.3, 


ignorability  obtains  under  Bayesian  inference  iff  this  quantity  takes  a  constant  positive 
value.  But  since  targeted  testing  means  that  g  is  not  constant  with  respect  to  y,  g  will  be 
constant  for  6  only  if  y  is  independent  of  d. 


Theorem  4.4.  The  missingness  induced  by  targeted  testing  is  ignorable  under  Bayesian 
inference  conditional  on  y;  i.e.,  the  correct  posterior  is  proportional  to  L  (0iMobs)p(^l>’)- 

Proof.  As  in  the  preceding  proof,  under  targeted  testing  the  expression  (2.9)  simplifies 
because  g  does  not  depend  on  u  and  the  missingness  parameter  (p  is  fixed  a  priori  by  the 
test  administrator.  If  we  condition  on  y,  it  becomes  ^g^{m)\m,d,yY  or,  by  the  definition 
of  g,  simply  (m)|m, y} — a  constant  with  respect  to  0 ,  as  required  by  Theorem  2.3.  ^ 


Remarks.  By  Theorem  4.2,  the  correct  value  of  the  MLE  is  obtained  for  d  under  targeted 
testing.  For  correct  Bayesian  inference,  however,  the  relationship  between  y  and  6  must 
be  taken  into  account,  so  L* (0lMobs)p(^l3')  yields  correct  inferences  but  L* (0lMobs)p(^) 

generally  does  not. 

5 . 0  Adaptive  Testing 

As  noted  above,  IRT  measurement  is  more  precise  if  an  examinee  is  administered 
items  that  are  informative  in  the  neighborhood  of  the  value  of  0  (Wainer  et  al.,  1990). 
Adaptive  testing  uses  an  examinee’s  preceding  responses  to  select  each  next  item  to 
administer.  Under  the  Rasch  model,  for  example,  an  examinee  answering  items  correctly 
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would  be  administered  successively  more  difficult  items,  and  an  examinee  answering 
incorrectly  would  be  administered  successively  easier  items. 

The  datum  observed  in  adaptive  testing  is  a  sequence  of  iV  <  n  ordered  pairs,  S  = 

where  /^identifies  the  ^’th  item  administered  and  is 

the  response  to  that  item.  Define  the  partial  response  sequence  as  the  first  k  ordered 
pairs  in  S,  with  the  null  sequence  sq  representing  the  status  as  the  test  begins.  Testing  may 
continue  until  a  desired  level  of  precision  is  reached,  a  predetermined  number  of  items  has 
been  administered,  or  a  specified  number  of  correct  or  of  incorrect  responses  has  been 
observed.  Augment  the  collection  of  items  with  the  fictitious  Item  0,  the  selection  of  which 
corresponds  to  a  decision  to  terminate  testing.  It  can  be  written  as  the  7/+lst  item  in  the 
adaptive  test,  but  no  response  is  associated  with  it. 

A  test  administrator  defines  an  adaptive  test  design  by  specifying,  for  all  items  7  and 
all  realizable  partial  response  sequences  %  the  probabilities  0(7,5*)  that  Item 7  will  be 
selected  as  the  test  item,  after  the  partial  response  sequence  5*  has  been  observed 
from  an  examinee.  Under  Bayesian  minimum  variance  item  selection,  for  example,  the  as- 
yet-unadministered  item  that  minimizes  the  expected  posterior  variance  of  6  with  respect  to 
the  current  distribution  p(0l5*)  is  chosen  as  the  ^+1^1  item  with  probability  one  (Owen, 

1975).  Note  that  the  value  of  5  conveys  the  value  of  v,  because  mj=l  if  /*=7  for  some 
k  e  {I,..., N)  and  m/=0  if  not,  and  the  responses  to  the  administered  items  constitute  • 

Theorem  5.1.  The  conditional  distribution  of  response  sequence  S  is  given  by 

N+l 

Prob(5  =  5l0)  =  /^(Mobs)  ri0(4.-5ik-i)-  (5.1) 

*=1 

Proof.  The  probability  of  S  for  an  examinee  with  ability  6  can  be  constructed  sequentially. 
The  probability  of  selection  for  the  first  item  is  0(ii,5o).  The  probability  of  response  to 

Item  /j  is  given  by  the  IRT  model  as  ),  which  does  not  depend  on  the  fact  that  Item 

i]  happened  to  have  been  presented  first.  The  probability  of  selection  for  the  second  item 
given  5i  is  0(i2,5i),  which  depends  on  the  value  of  but  not  on  0  given  m,  .  The 

probability  of  the  corresponding  response  is  ) ,  again  independent  of  the 

identification  of,  and  the  response  to,  the  first  item.  Continuing  in  this  manner  through  the 
decision  to  stop  testing,  or  the  selection  of  Item  0  as  the  item,  yields 
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N  .  ,N+1 

Prob(S  =  ^1 0)  =  n  /e  )  IT  ^(4 .  ) 


//+1 

=  /e(«obs)n0(4’-yit-i)- 

k=\ 


Theorem  5.2.  The  missingness  mechanism  induced  by  adaptive  testing  satisfies  MAR. 

Proof.  Since  5  conveys  the  values  of  m  and  Mobs»  the  probability  of  (m,M) is  the  product  of 
the  probability  of  (i)  observing  a  response  sequence  that  implies  m  and  ,  and  (ii)  the 

probability  of  Uj^is-  Denote  by  T^=^s:M  =  m  and  =  Mobs}  the  set  of  response 

sequences  with  missingness  patterns  and  observed  item  responses  that  match  those  of 
(m,u).  Then 

Prob(M  =  m,U  =  u)  =  Prob(5  eT^)x  Prob(f/j^s  =  M^jj^) 

=  j  Prob(S  €  Tje)  X  Prob(f/^,  =  »,„is|0)5e 

A^+1 

=  n  X  /0(“obs)n^(4’^/k-l)f/0(«mis)<^^ 


ssT„  *=1 


=  ]  X  n^(4’‘y;t-i)[{J/e(“obs)/e(  ^mis  H 


=  ]  X  n^(4.^^-i)[xProh(C/  =  M). 


s€7„  *=1 


Since  g^{m\u)  =  Prob(A/  =  m,U  =  u)/PToh{U  =  u), 

N+l 

g^(mlM)=  X  n0(4’'y/t-l)» 

seT„  k=} 

which  does  not  depend  on  m^jj,  as  required  to  satisfy  MAR. 


Theorem  5.3.  The  missingness  induced  by  adaptive  testing  is  ignorable  under  direct 
likelihood  and  Bayesian  inference. 

Proof.  By  Theorem  5.1,  the  conditional  probability  of  the  observation  S  given  6  factors 
into  two  terms,  namely  feii^dbs)  ^  term  that  does  not  depend  on  6 .  The  term 

L{d,  ^1  v)  in  direct  likelihoods  and  Bayesian  posteriors  thus  reduces  to  L* (01  Mobs)  *  tis 

required  for  ignorability.  CU 
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Alternative  Proof.  The  missingness  mechanism  g  in  adaptive  testing  is  specified  a  priori, 
and  does  not  depend  on  0;  thus,  D  is  satisfied  in  both  the  likelihood  and  Bayesian  senses. 
By  Theorem  5.2,  the  missingness  mechanism  is  also  MAR.  The  sufficient  conditions  in 
Theorem  2. 1  for  ignorability  are  satisfied.  □ 

Remark.  Ignorability  under  direct  likelihood  inference  means  that  L*  (0iMobs)  yields  the 

/S 

correct  value  of  the  MUE  0  from  an  observed  s,  but  it  does  not  justify  the  sampling- 
distribution  inteipretation  of  0 ;  that  is,  the  correct  point  estimate  is  identified  but  no  claims 
about  its  distribution  in  repeated  samples  for  fixed  0  necessarily  follow.  It  can  be  shown 
that  Rubin’s  (1976)  NS  conditions  for  ignorability  under  sampling-distribution  inference 
would  require  that  the  probability  of  any  given  missingness  pattern  be  the  same  no  matter 
what  values  the  responses  took  (Mislevy  &  Wu,  1988).  But  since  by  definition  adaptive 
tests  produce  missingness  patterns  as  a  function  of  the  response  values  that  are  observed, 
only  a  degenerate  adaptive  testing  scheme  would  satisfy  this  condition.  Concluding  that  the 
item  selection  mechanism  is  not  ignorable  for  sampling  distribution  inference  in  general 
means  that  the  correct  sampling  distribution  for  0  must  be  verified  with  respect  to  repeated 
administrations  of  the  entire  adaptive  test.  Chang  and  Ying  (1996)  consider  the  sampling 
variance  of  0  to  the  second  order  derivative  of  L*,  and  offer  some  large-sample  conditions 
under  which  the  latter  is  a  reasonable  large-sample  approximation  of  the  former. 

6 . 0  Not-Reached  Items 

IRT  is  intended  for  “power”  tests,  or  those  in  which  an  examinee’s  chances  of 
responding  correctly  would  not  differ  appreciably  if  the  time  limit  were  more  generous. 

Time  limits  are  typically  chosen  to  allow  most  examinees  to  respond  to  all  items,  but  some 
examinees  don’t  have  time  to  answer  all  of  them  (see  Yamamoto  &  Everson,  1995,  for 
analyses  of  the  situation  in  which  examinees  respond  in  accordance  with  an  IRT  model 
until  time  is  nearly  up,  then  switch  to  random  responding.)  This  section  concerns  the  items 
that  an  examinee  does  not  reach,  assuming  the  following  conditions: 

(i)  An  LI  IRT  model  would  give  response  probabilities  if  the  examinee  had  interacted 
meaningfully  with  all  the  items. 

(ii)  The  examinee  has  no  information  about  the  difficulty  or  content  of  the  items  at  the 
end  of  the  test,  but  has  decided  to  work  instead  from  the  beginning  of  the  test 
toward  the  end,  answering  all  items  along  the  way,  until  time  expires. 
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(iii)  All  n  items  are  administered.  (If  they  are  not,  the  results  of  this  section  must  be 
combined  with  those  of  Sections  3  or  4,  as  appropriate.) 

Theorem  6. 1 .  Under  conditions  (i)-(iii),  the  missingness  induced  by  failing  to  reach  the 
end  of  the  test  because  of  time  limitations  satisfies  MCAR,  and  therefore  MAR  as  well. 

Proof.  Under  conditions  (i)-(iii),  n+1  patterns  of  missingness  can  occur:  For  k  =  0,...,n, 
let  denote  the  string  of  n-k  I’s  followed  by  k  O’s.  That  is,  is  the  missingness 
pattern  of  an  examinee  that  has  not  reached  the  last  k  items.  The  missingness  process  is 
characterized  by  the  examinee  speed  parameter  0  =  (^o > •  •  •’  )  >  where  (j)/^  = 

Prob^M  =  j,  the  probability  that  the  examinee  will  not  reach  the  last  k  items,  and 

=  “II  “if 

^  [0  otherwise. 

As  required  for  MCAR,  g  does  not  depend  on  u.  □ 

Theorem  6.2.  Under  conditions  (i)-(iii),  the  missingness  induced  by  failing  to  reach  the 
end  of  the  test  because  of  time  limitations  is  ignorable  under  direct  likelihood  inference  if  all 
values  of  6  can  occur  at  all  values  of  (j) . 

Proof.  The  requirement  that  all  values  of  6  can  occur  at  all  values  of  ^  is  D,  as  it  pertains 
to  direct  likelihood  inference.  By  Theorem  6. 1,  the  missingness  is  MAR  under  conditions 
(i)-(iii).  By  Theorem  2.1,  D  and  MAR  give  ignorability  under  direct  likelihood  inference.n 

Theorem  6.3.  Under  conditions  (i)-(iii),  the  missingness  induced  by  failing  to  reach  the 
end  of  the  test  because  of  time  limitations  is  ignorable  under  Bayesian  inference  if  p{6,  (p) 

=  p{0)p{(l>). 

Proof  The  requirement  that  p{6,<l))  =  pid)p{<t>)  is  D,  as  it  pertains  to  Bayesian  inference. 
Under  conditions  (i)-(iii),  the  missingness  is  MAR.  By  Theorem  2.1,  these  two  conditions 
imply  ignorability  under  Bayesian  inference.  □ 

Theorem  6.4.  Under  conditions  (i)-(iii),  the  missingness  induced  by  failing  to  reach  the 
end  of  the  test  because  of  time  limitations  is  ignorable  under  Bayesian  inference  only  if,  for 
each  jfc  =  0, . . . ,  n ,  the  expected  value  of  is  constant  across  all  values  of  0 . 

Proof.  For  a  given  not-reached  pattern  Equation  (2.9),  the  NS  condition  for 
ignorability  under  Bayesian  inference  simplifies  to  J  V(0^*^l^)^^^*^t)ecause  g  does  not 

depend  on  u.  D 
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Remark.  Since  independence  of  d  and  0  implies,  but  is  not  implied  by,  constant 
conditional  means  for  0 ,  Theorem  6.4  gives  a  weaker  condition  than  Theorem  6.3  for 
ignorability  of  not-reached  items  under  Bayesian  inference. 

By  Theorem  6.2,  if  conditions  (i)-(iii)  hold  then  not-reached  responses  are 
ignorable  under  direct  likelihood  inference;  in  particular,  the  correct  value  is  obtained  for 
the  MLE.  For  ignorability  to  hold  under  Bayesian  inference,  however,  it  is  further 
necessary  that  the  expected  value  of  each  speededness  subparameter  is  constant  across  all 
values  of  9  (Theorem  6.4).  Empirical  evidence  suggests  that  this  is  not  generally  trae. 

Van  den  Wollenberg  (1979),  for  example,  reports  significant  positive  correlations  between 
percent-correct  scores  on  the  first  eleven  items  (which  were  reached  by  all  examinees)  and 
the  total  number  of  items  reached,  in  four  of  six  intelligence  tests  in  the  ISI  battery 
(Snijders,  Souren,  and  Welten,  1963).  Bayesian  inference  about  9  would  take  this 
relationship  into  account  by  using  the  correct  posterior  distribution,  or 
p(  ^|v)  p(v\9,  (/>)p(  9, 0) 

=  [Jp(^,«|^,  0)<^«mis]p(^,  0) 

=  fe  («obs  )gf  {m)p{9,(p)  [by  MCAR] 

=  I*(0lMobs)g^(m)p(0,0). 

Further,  the  correct  marginal  posterior  for  9  is  obtained  as 

p[9\v)  =  \l^(9\uQy,^)g^{m)p{9,4>)d(i> 

=  (^•«obs)[Jp('«l0M0l^)^0]p(P) 

=  L*(0lMobs)p(wl0)p(0) 

ocL*(6lMobs)p(0lm). 

7.0  Intentionally  Omitted  Items 

A  missing  response  is  an  intentional  omission  when  an  examinee  is  administered 
the  item,  has  time  to  appraise  it,  and  decides  for  whatever  reason  not  to  respond.  After 
arguing  that  such  omissions  can’t  generally  be  considered  ignorable,  we  discuss  ways  to 
deal  with  them.  Several  solutions  suggested  in  the  test  theory  literature  and  an  approach 
suggested  by  the  present  analyses  are  considered. 


Missing  responses  and  IRT 
Page  15 


7.1  Omitting  Behavior  and  Ignorability 

Used  with  tests  of  dichotomous  items,  a  test  score  r(v)  summarizes  a  pattern  of 
rights,  wrongs,  and  omits  for  the  purposes  of  comparing,  selecting,  or  describing 
examinees.  Formula  scores  take  the  form 

r(v)  =  R(v)-ClF(v),  (7.1) 

where  R{v)  and  W(v)  are  counts  of  right  and  wrong  responses  and  C,  with  0  <  C  <  1 ,  is  a 
constant  selected  by  the  test  administrator.  Setting  C  =  0  gives  number-right  scores;  C  =  1 
gives  right-minus-wrong  scores;  and  for  multiple  choice  items  with  A  alternatives, 

C  =  1  /  (A  - 1)  gives  the  familiar  “corrected-for-guessing”  scores. 

Theorem  7.1.1.  For  a  set  of  items  for  which  a  SMURFLF^)  IRT  model  holds, 

£■{7(17)10}  is  increasing  in  0. 

Proof. 

E{T{U)\d}  =  E{R{U) -C-  WiUp] 

=  XProb(c/,-  =  1|0)  -  CX[1  -  PT0h(Uj  =  1|0)] 

;=i  7=1 

=  (1  +  OE  Prob(Uy  =  1|0)  -  Cn. 

7=1 

By  monotonicity,  0'  >  0"  Prob(l7y  =  l|0')  >  Prob(t/^  =  l|0")  for  all  items  j,  so  the  final 
equation  implies  that  0'  >  0"  £{r(17)l0'}  >  E{T{U)\e"}.  □ 

Theorem  7.1.2.  For  any  partitioning  of  items  inducing  U  =  {U',U"),  E{T{U)}  = 
E{T{U')}  +  E{T(U")}.  Similarly,  £{r(V)}  =  £{7’(r)}  +  i^{7’(F")}, 

T{u)  =  T(u')  +  T(u"),  and  T(v)  =  T(v')  +  T(v"). 

Proof.  These  results  follow  from  the  linearity  of  the  definition  of  T. 

Since  a  correct  response  to  an  item  gives  a  higher  test  score  than  an  incorrect 
response,  examinees  who  wants  to  obtain  high  scores  will  make  responses  they  think  are 
correct.  How  examinees  will  respond  to  an  item  about  which  they  are  unsure  depends  at 
least  partly  on  how  the  test  will  be  scored  (Sabers  and  Feldt,  1968).  They  maximize  their 
expected  scores  by  answering  items  for  which  they  think  their  probability  of  being  correct 
is  at  least  C/(l  +  C).  Under  number-right  scoring,  they  should  answer  every  item;  under 
corrected-for-guessing  scoring,  they  should  answer  those  which  they  think  the  probability 
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is  at  least  1  /  A ;  and  under  right-minus-wrong  scoring,  they  should  answer  those  which 
they  thinks  the  probability  is  at  least  1/2  ( C/(l  +  C)  when  C=l).  Examinees  may  differ  in 
the  accuracy  of  their  estimates,  their  confidence  about  them,  and  their  propensities  to  omit 
rather  than  make  responses  about  which  they  are  uncertain.  Such  characteristics  of  an 
examinee,  as  they  are  evoked  under  given  test  administration  conditions,  constitute  the 
missingness  parameter  0  in  the  case  of  intentional  omission.  For  example,  analyzing 
responses  to  items  that  examinees  originally  omitted  under  right-minus-wrong  scoring, 
Sherriffs  and  Boomer  (1954)  did  find  that  about  half  of  the  omitted  responses  would  have 
been  correct  among  examinees  who  scored  low  on  a  risk-aversion  scale,  but  nearly  two- 
thirds  would  have  been  correct  among  examinees  with  high  risk-aversion  scores.  Intuition 
and  empirical  evidence  (e.g..  Stocking,  Eignor,  &  Cook,  1988)  suggest  that  the  following 
three  conditions  typify  omitting  behavior  in  educational  testing: 

(iv)  For  any  given  0  and  a  given  item,  examinees  are  more  likely  to  omit  items  when 
they  think  their  answers  would  be  incorrect  than  items  they  think  their  answers 
would  be  correct. 

(v)  As  Q  increases,  an  examinee  is  more  likely  to  recognize  when  a  response  would  be 
correct,  and  are  thus  less  likely  to  omit  it;  i.e.,  for  all  items y, 

6"  >0'=^  80,d"{^j  =  =  l)  <  =  l), 

where  G  has  been  made  explicit  as  a  subscript  of  g  to  emphasize  the  dependence  of 
omitting  behavior  on  6. 

(vi)  Similarly,  as  6  increases,  an  examinee  is  more  likely  to  recognize  when  a  response 
would  be  incorrect,  and  are  thus  more  likely  to  oniit  it;  i.e.,  for  all  items  7, 

>0'=^  i<pA^J  =  =  0\Uj  =  0). 

Theorem  7.1.3.  For  a  given  and  an  m  in  which  missing  responses  are  due  to 

intentional  omission,  the  missingness  process  is  MAR  only  if  this  missingness  pattern  is 
equally  likely  for  all  values  of 

Proof.  This  follows  immediately  from  the  definition  of  MAR.  □ 

Remarks,  (a)  If  Condition  (iv)  above  holds,  examinees  are  more  likely  to  omit  items  when 
they  think  their  responses  would  be  wrong  rather  than  right.  MAR  would  imply  that  their 
perceived  probabilities  of  correctness  are  independent  of  the  probabilities  given  by  the  IRT 
model;  that  is.  Conditions  (v)  and  (vi)  could  not  also  hold,  (b)  MAR  (along  with  D)  is 
merely  sufficient  for  ignorability,  and  ignorability  can  hold  when  MAR  does  not.  If 


Missing  responses  and  IRT 
Page  17 

ignorability  does  hold,  though,  the  following  counter-intuitive  condition  holds  for  expected 
score  of  the  omitted  responses. 

Theorem  7.1.4.  Suppose  a  SMURFLI(2)  IRT  model  holds  in  a  domain  of  items,  and  the 
missingness  induced  by  intentional  omission  is  ignorable  under  likelihood  or  Bayesian 
inference.  For  any  given  pattern  of  omits  (m )  and  responses  to  observed  items  (wobs) » 

probability  of  this  missingness  pattern  is  (a)  constant  with  respect  to  6,  even  though  (b) 
the  expected  score  of  the  missing  responses  r(t/njis)  is  increasing  with  respect  to  6. 

Proof.  Regarding  (a):  The  NS  condition  for  ignorability  under  likelihood  inference  (2.8) 
requires  that  for  each  value  of  the  value  of  (the 

probability  that  missingness  pattern  in  will  be  observed  given  and  6)  is  constant 

with  respect  to  6.  Regarding  (b):  By  local  independence,  the  distribution  of  (given 
in)  depends  on  0,  and  not  on  (p  or  Mobs-  By  Theorem  7.1.1,  £'{r(C/n,is)|e},  is  increasing 

in  6 .  The  same  argument  holds  for  ignorability  under  Bayesian  inference,  since,  by 
Theorem  2.3,  it  is  required  that  the  expectation  of  (2.8)  over  <j),  or  (2.9),is  constant  with 

respect  to  6.  ^ 

Remark.  Theorem  7.1.4  says  that  if  ignorability  holds  under  a  SMURFLI(2)  IRT  model, 
high-ability  examinees  are  just  as  likely  to  produce  any  given  missingness  pattern  as  low- 
ability  examinees  (given  Mobs)>  ®ven  though  the  missing  responses  have  a  greater  expected 

contribution  to  their  total  test  score.  This  result  belies  (iv)-(vi),  since  higher-ability 
examinees  are  more  likely  to  make  correct  responses  to  the  missing  items,  more  likely  to 
recognize  they  are  correct,  and  more  likely  to  make  the  responses  rather  than  omit  them. 

COROLLARY  7.1.5.  Suppose  a  SMURFLI(2)  irj  model  holds  for  a  one-item  domain,  and 
the  missingness  induced  by  intentional  omission  is  ignorable  under  likelihood  or  Bayesian 
inference.  The  probability  that  the  response  will  be  omitted  is  constant  with  respect  to  6 
even  though  the  probability  that  it  will  be  answered  correctly  increases  with  6 . 

7.2  Modeling  Intentional  Omissions 

Since  ignorability  is  not  generally  satisfied  for  direct  likelihood  inference,  it  is 
necessary  to  base  inference  on  p(vl  9,  (p)  when  omitting  is  a  possibility.  Accepting  that 
omits  are  not  ignorable  means  g^{m\u)  depends  on  Mj^jj,  and  cannot  be  determined  from 

observations  of  V  alone.  Implementing  any  approach  for  modeling  intentional  omission 
thus  requires  that  the  analyst  must  either  specify  the  mechanism  of  the  omitting  process  a 
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priori,  or  estimate  it  from  an  experiment  with  the  same  items  and  similar  examinees  in 
which  the  values  of  item  responses  that  were  originally  omitted  are  subsequently  obtained. 
Section  7.2.1  presents  an  approach  from  the  perspective  of  Rubin’s  model,  and  Sections 
7.2.2-7.2.4  evaluate  various  alternative  approaches  that  have  appeared  in  the  literature. 

7.2.1  An  Approach  Based  on  Rubin’s  Model 


Rubin’s  framework  begins  with  a  full  model  for  response  and  omission,  {UM), 
using  the  form  Prob(t/  =  u,M  =  m\d,  (f>)  =  g^(/nlM)/0(M)  with  /^(m)  an  LI  IRT  model. 

This  section  sketches  some  specifics  about  how  this  approach  might  be  implemented, 
bearing  in  mind  that  {m\u)  cannot  be  estimated  from  observations  of  V alone  because  its 

values  depends  on  The  main  ideas  are  (i)  viewing  the  missingness  parameter  (f)  as 
the  concatenation  of  item-specific  missingness  parameters  tJj  and  examinee-specific 

parameters  co,  and  (ii)  assuming  conditional  independence  across  the  expanded  item 
response  [uj,mj'j  given  the  extended  examinee  parameter  (6,0))  and  extended  item 

parameters  Specifically,  it  may  be  posited  that 


where 


Prob(t/  =  u,M  =  m|0,0)  = 

7=1 

=  Prob^My  =  nijpj  =  Uj,d,Q},rij^. 


(7.2) 

(7.3) 


For  dichotomous  items,  one  plausible  form  for  (7.3)  is  a  pair  of  linear  logistic 

regression  functions  for  the  probability  of  omitting  incorrect  and  correct  responses,  given 
an  items’  tendency  to  provoke  omission  ( rij),  the  examinee’s  ability  ( 6),  and  the 

examinee’s  tendency  to  omit  ( m): 


Prob^Afy  =  0Uj=  Uj,  6, 0),  rjj^ 


"^(Vjoi  +  Vjoi^  +  O))  ifuj=0 

< 

'^{Vjn  +  ljn&  +  0})  if  uj=  I, 


where  4’(z)  =  exp(z)/[l  +  exp(z)].  More  ambitious  models  would  allow  for  the 
dependence  of  g  on  covariates  y  as  well,  since  empirical  evidence  suggests  that  omitting 
behavior  can  vary  systematically  with  factors  such  as  gender  and  culture  (e.g..  Wolf,  1977, 
p.  33).  With  g  sufficiently  specified  in  this  or  some  other  manner,  and  in  conjunction  with 
an  IRT  model,  likelihood  and  Bayesian  inference  about  6  from  v  can  proceed  from  (2.4) 
or  (2.5)  respectively. 
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7.2.2  Filling  in  the  Blanks 

Lord  (1974)  suggested  that  omits  on  dichotomously-scored  multiple-choice  items 
under  guessing-corrected  scoring  can  be  handled  with  standard  IRT  estimation  routines  if 
they  are  treated  as  fractionally  correct,  with  the  value  c=l/(#  alternatives).  He  assumed 
“rational”  omitting  behavior:  Examinees  omit  items  only  if  their  chances  of  responding 
correctly  would  have  been  c,  so  Prob^C/^  =  1  Mj  =  oj  =  c  for  all  items  and  all  6.  Now  the 

log  likelihood  to  be  maximized  to  obtain  6  if  there  is  no  possibility  of  missing  responses  is 
£{e\u)  =  50  tjHj  iogPjie)  +  (l  -  Uj)\ogQjie)],  (7.4) 

where  Py(0)=Prob|f/y  =10,/3^)  and  Qj{d)  =  1- Pj{9).  Lord  proposed  maximizing  the 
pseudo  log  likelihood  given  by 

£**{d\v)  =  Sg  f\wj  \ogPj(d)  +  [l  -  wy)log2y(0)l  (7.5) 

7=1 

where  Wj  =  Uj  if  mj  =  1  and  Wj  =  c  if  trij  =  0. 

Lemma  7.2.1 .  The  likelihood  based  on  the  hypothetical  complete  data  can  be  written  as 
the  product  of  two  factors,  one  of  which  involves  only  u  and  6,  and  not  the  missingness 
process  Or  m. 

Proof.  By  definitions  and  elementary  properties  of  probability, 

Prob(l7  =  M,M  =  m|0,0)  =  Prob(M  =  m|C/  =  m,  0,  ^)Prob(f/  =  u\9,(l))  =  g^{m\u)f0{u). 

Viewed  as  a  likelihood, 

L[9,(l)\u,m)  =  60^Li[9,<l)\u,m)x  50L2{9\u), 

with  Li[9,(l)\u,m)  =  g^{m\u)  and  I^(0|m)  = /^(m).  □ 

Theorem  7.2.2.  Consider  a  domain  of  multiple-choice  items  for  which  a  SMURFLP^) 
IRT  model  holds,  but  for  which  examinees  may  intentionally  omit  responses.  If 
Prob^t/^  =  l^Mj  =  oj  =  c  for  all  items  and  all  9 ,  then  Lord’s  (1974)  pseudo  log  likelihood 

(7.5)  is  the  expectation  of  the  log  of  a  conditional  likelihood  function  for  P;  specifically, 

f**(0lv)  =  {logZ^(e|M)|Hobs}. 

where  ^  given  in  Lemma  7.2.1. 
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Proof. 

=  ^  [^j^OgPj{d)  +  {l-Uj)lOgQj{e)j+  X  [«jlOgPy(0)  +  (l-My)lOg(2/0)]^ 

=1  ;:m.=0 

=  X  [uj log Pj{e)  +  (l-uj)\ogQj{d)]+  X[clogP;(0)  +  (l-c)Ioge^(e)]. 

j:mj=l  j:mj=0 

This  final  expression,  multiplied  by  Sq,  is  Lord’s  ^**(0lv).  □ 

Remarks,  (a)  It  is  a  standard  technique  in  likelihood  estimation  to  eliminate  nuisance 
variables  from  a  problem  by  factoring  the  likelihood,  and  basing  inference  on  only  those 
factors  which  do  not  involve  the  nuisance  variable.  Doing  so  yields  “limited  information” 
inferences,  so  called  because  they  forego  information  about  the  target  parameter  contained 
in  the  neglected  factors.  In  this  case.  Lord’s  solution  does  not  use  information  about  9 
conveyed  by  m  through  L\ .  (b)  Maximizing  the  expected  value  of  the  log  likelihood  of  Li 
with  respect  to  the  missing  responses  is  an  instance  of  the  general  approach  to  inference 
with  missing  data  described  in  Dempster,  Laird,  and  Rubin  (1977);  specifically,  it  is  a  one 
step  “EM”  solution,  (c)  Taken  together,  (a)  and  (b)  justify  Lord’s  (1974)  maximization  of 
(7.5)  as  an  “expected  limited-information”  MLE  for  9. 

The  foregoing  analysis  yields  insight  into  other  treatments  of  omits  that  impute 
values  for  Supplying  random  responses  that  are  correct  with  probability  c  provides  a 

crude  numerical  approximation  of  (7.5),  leading  to  a  maximizing  value  which  has  the  same 
expectation  as  when  the  integration  is  carried  out  in  closed  form  in  the  proof  of  Theorem 
7.2.2.  This  practice  is  justified  by  the  same  assumptions  as  Lord’s  (1974)  approach. 
Supplying  incorrect  responses  for  omits  also  leads  to  an  “expected  limited-information” 
MLE  for  9,  but  under  the  assumption  that  responses  to  omitted  items  would  surely  have 
been  incorrect;  that  is,  Prob^f/y  =  l^Mj  =  oj  =  0  for  all  items  and  all  9.  This  is  implausible 

for  multiple-choice  items  for  which  even  the  least  able  examinees  have  nontrivial 
probabilities  of  success  through  guessing,  so  in  this  case  supplying  incorrect  responses  for 
omits  biases  inference  about  9  downward. 
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Lord  addressed  “rational”  omitting  behavior,  in  that  the  expectation  of  correctness 
for  an  omitted  response  is  always  c,  the  value  associated  with  the  optimal  omitting  strategy. 
As  noted,  however,  studies  of  responses  to  items  originally  omitted  show  that  not  all 
examinees  behave  in  this  manner.  The  tendency  to  omit  when  probabilities  of  success  may 
be  higher  than  c  and  can  be  associated  with  personality  characteristics,  demographic 
variables,  and  level  of  ability.  This  approach  biases  estimates  of  G  downward  for  risk- 
aversive  examinees.  Although  Section  7.2.1  showed  how  such  dependencies  can  be  taken 
into  account,  it  is  by  no  means  certain  that  this  should  be  done;  to  do  so  effectively  adjusts 
scores  upward  or  downward  in  accordance  with  examinee  background  characteristics, 
which  may  be  objectionable  on  the  grounds  of  fairness.  Assuming  rational  omitting 
behavior  in  scoring  rules,  and  making  the  rules  and  optimal  strategies  as  clear  as  possible 
to  examinees,  is  probably  preferable  when  test  scores  are  used  to  make  sensitive  placement 
or  selection  decisions. 

7.2.3  Lord’s  (1983)  Model  for  Omits 

Lord’s  (1974)  treatment  of  omits  as  fractionally  correct  neither  presented  nor 
exploited  the  full  likelihood  induced  by  the  data.  Lord’s  (1983)  model  for  omits  maintains 
the  context  of  guessing-corrected  scoring  of  multiple-choice  items  with  A=  1/c  alternatives, 
but  offers  additional  structure  for  the  response  process.  The  model  first  assumes  that  an 
examinee  either  feels  a  preference  for  one  of  the  alternatives  or  is  totally  undecided  among 

them.  The  proportion  of  examinees  with  ability  6  feeling  no  preference  on  Item;  is 
Rj{d).  If  a  preference  is  felt,  a  response  is  made;  and  of  the  responses  made  by  examinees 

with  ability  6  who  feel  a  preference,  the  proportion  correct  is  P*j{0).  If  no  preference  is 

felt,  the  examinee  will  either  omit  the  item  with  probability  (o  (an  examinee  missingness 
parameter),  or  respond  at  random.  Note  that  (G,co)  constitutes  the  missingness  parameter 
0  in  Rubin’s  notation.  Responses  and  omitting  decisions  are  assumed  independent  over 
items,  given  G  and  0). 

This  model  does  not  address  the  correctness  or  incorrectness  of  hypothetical 
responses  to  omitted  items;  that  is,  Uj  is  undefined  when  My  =  0.  In  order  to  analyze  the 

approach  from  Rubin’s  (1976)  perspective,  we  extend  it  in  concert  with  Lord’s  hypothesis 
that  an  examinee  who  omits  is  totally  undecided  among  the  alternatives,  by  positing  the 
following  condition: 
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(vii)  For  all  items  j,  Prob^t/j  =  l|Afy  =  0,d,Q)j  =  c  and 
Prob^f/y  =  O^Mj  =  0, 0,  mj  =  1  -  c  for  all  6  and  co ; 

f/n^s  is  thus  independent  of  0  and  to.  Assuming  Condition  (vii),  the  tree  in  Figure  3 
shows  the  conditional  probabilities  of  arriving  at  {mj,Uj^  in  the  different  possible  ways. 

[Figure  3  about  here] 

The  likelihood  function  for  the  observed  data  can  then  be  written  as  follows: 
Prob(y  =  v|0,m) 

=  Prob(M  =  m\6,0})j  pProb^(7y  =  uj^d, 

7=1 


{Prob(M  =  m\e,(o)]\  nProb(c/y  =  Uj\e,(o,Mj  =  l)  Afy  =  oJo'mj, 

\rmj=\  J  ;:my=0 


{Prob(M  =  m\e,a))}\  nProb(t/^-  =  uj\e,(0,Mj  =  l) W J  (l - Cjj 

rmj=\  j:mj=0 


{Prob(M  =  m\d, ft))}j  JJProb^f/y  =  Uj\p,a),Mj  =  1 


,7=1  J  [7:'«j=l 

where  P** {9,0))= Proh^U j  =  1 9,a),Mj  =  l),  or  the  conditional  probability  that  a  response 

will  be  correct  given  that  it  is  observed,  is  the  normalized  sum  of  making  a  correct  response 
when  a  preference  is  felt  and  of  guessing  correctly  when  a  preference  is  not  felt  but  a 
random  response  is  made: 

p“(ft  ffl) = [i  -  -  Ry(«)]p;(e) + c(i  -  <B)*/e)}. 

Once  V  is  observed,  (7.6)  provides  a  foundation  for  likelihood  and  Bayesian 
inference  about  9  and  (o .  Lord  suggested  that  the  model  could  be  implemented  by 
specifying  functional  forms  for  P*  and  Rj,  such  as  the  3-parameter  logistic  IRT  function 

sic 

for  Pj  and  the  2-parameter  logistic  with  a  negative  slope  for  Rj. 
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Theorem  7.2.3.  Assuming  Condition  (vii),  the  model  for  Uj  implicit  in  Lord’s  (1983) 
model  for  omits  can  be  written  in  terms  of  the  unidimensional  ability  variable  6  as  follows: 


cRj(e)  + 
{i-c)Rj{e)+ 


i-Rj{e) 

l-R/0) 


P)(e) 


if  Uj  =  1 
if  Uj  =  0 


(7.7) 


Proof. 


P[Uj  =  \\d,G})  =  X  P[Uj  =  \\e,(0,Mj  =  ky[Mj  =  k\d,(o) 


=  Wt/y  =  =  l)p(My  =  l|fl,(0)  + 

p(c/y  =  i|9,  at,  Mj  =  o)p(Mj = o|e,  ®) 
=  {c(l  -  co)Rj(0)  +  [l  -  Rji0)]p]  («)}  +  ca>Rj(e) 
=c*y(e)+[i-«y(e)]p;(e). 


Note  that  (o  drops  out  of  the  expression.  Then 

p[Uj  =  O|0,m)  =  1  -  P[Uj  =  110,  (o) 

=  i-cfiy(e)-[i-«y(0)]p;(e) 

=(i-c)Py(e)+[i-Py(e)|i-p;(e)]. 


□ 


Remark.  Lord  points  out  that  if  ty=0  for  all  examinees  (there  is  no  possibility  of  omitting), 
then  the  resulting  IRT  model  is  just  (7.7).  In  a  manner  described  by  Samejima  (1979),  the 
operating  characteristic  curve  for  a  correct  response,  or  Prob(t/y  =  l|0j ,  need  not  be 

monotone  increasing  in  6 .  Very  low  ability  examinees  would  feel  no  preference  at  all,  and 
answer  correctly  at  a  rate  equal  to  c;  moderate-ability  examinees  might  tend  to  feel  a 
preference  for  a  clever  distractor  and  answer  correctly  at  a  rate  lower  than  c;  and  high- 
ability  examinees  would  tend  to  feel  preferences  and  respond  correctly. 


Lord’s  focus  on  the  nonignorable  nature  of  intentional  omissions  is  apparent  in  the 
following  result. 

Theorem  7.2.4.  Consider  the  case  of  one  item,  or  n=l.  (Subscripts  on  R,  P*,  M,  etc., 
may  thus  be  supressed.)  Under  Lord’s  (1983)  model,  augmented  by  Condition  (vii), 
omitting  is  ignorable  with  respect  to  direct  likelihood  inference  about  6  only  under  the 
degenerate  conditions  that  either 
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(a)  (0=0  (i.e.,  there  is  no  possibility  of  omitting); 

(b)  both  R{d)  and  P* (9)  are  constant  with  respect  to  6  (i.e.,  neither  the  tendency  to 
feel  a  preference  leading  to  a  response,  nor  the  chances  of  an  observed  response 
being  correct,  depend  on  0);  or 

(c)  /?( 0)  =0  (i.e.,  nobody  ever  feels  a  preference  among  responses). 

Proof.  If  (0=0,  there  is  no  possibility  of  omitting,  so  Af  =  1 ,  U  =  and,  for  all 
Prob(M  =  l|M^^j,0,n))=l.  Then 

Prob(y  =  v|6>,<t))  =  Prob(M  =  l,U  =  u^bs\^,(o) 

=  Prob(M  =  l|C/  =  Ugi,^,6,(o)PTob{U  =  Ugbs\9,(o) 

=  1 X  Prob(i[/  =  Ugi,^\d,(o) 

=  Proh(U  =  u,,p), 

and  omitting  is  (trivially)  ignorable. 

Now  suppose  (O^O.  Theorem  2.2,  NS  conditions  for  ignorability  under 
likelihood  inference,  requires  that  the  parameter  spaces  of  6  and  (p  are  distinct;  and  that  for 
each  <p,  the  expression  in  (2.8),  or  constant  with 

respect  to  6.  As  for  the  requirement  of  distinctness:  In  general,  6  is  included  in  ^  in 
Lord’s  model  (that  is,  <p  =  (6,  (o)),  since  the  probability  that  an  examinee  with  ability  6 
will  omit  Item  j  is  (oRj{d).  If  (O^O,  then  only  when  Rj{d)  is  constant  with  respect  to  6 

for  all  items  do  the  parameter  spaces  of  9  and  (p  become  distinct,  since  <p  then  simplifies 

to  (0  alone.  Thus,  for  distinctness  to  be  satisfied  for  direct-likelihood  ignorability  when 
n=l  and  n>  5^  0,  it  is  necessary  thati?(0)  is  constant  with  respect  to  9. 

As  for  the  further  requirement  that  (2.8)  be  constant  with  respect  to  9,  we  consider 
separately  v  =  (0,*),  (1,1),  and  (1,0),  since  for  ignorability  to  hold  in  general,  it  must  hold 
for  each  potential  pattern  of  observations.  The  required  expressions  are  derived  from 
Figure  3. 

r0.*l:  If  V  =  (0,*),  then 

^“mis  {^^(^^*^mis’^obs)|^’^obs’ 

=  Prob(M  =  0|C/  =  0, 9,  n))Prob(f/  =  O|0.  (o)  +  Prob(M  =  0\U  =  1, 9,  m)Prob(C/  =  l\9,  (o) 

=  Prob(M  =  0,U  =  0\9,(o)  +  Prob(M  =  0,U  =  \\9,  (o) 

=  R{9)(o{\-c)  +  R{9)(Oc 
=  R{9)(0. 
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This  expression  is  in  fact  constant  in  6  if  /?(0)  is  constant  in  6,  so  ignorability  holds 
under  this  condition  if  the  observation  is  an  omitted  response.  We  now  see,  however,  that 
this  is  not  sufficient  when  the  observation  is  an  observed  right  or  wrong  response. 


n.l')  and  (1.0):  If  v  =  (1,1),  then  there  is  no  to  integrate  over  and 

^“mis  {^0  “mis .  «obs  “obs  =  Prob(M  =  Ilf/  =  1, 0, 0)) 

_Prob(M  =  l,t/  =  l|0,£o) 

Prob(f7  =  l|e,n)) 

[l-/?(g)]P*(g)  +  cR(g)(l-m) 
[l-/?(0)]P*(0)  +  cR(e) 


(7.8a) 


Similarly,  if  v  =  (1,0), 

{^^('”'“mis>«obs)|'«>“obs>^.'^}  =  Prob(M  =  l|f7  =  0,e,Q)) 

_  Prob(M  =  l,f/  =  O|0,<w) 
Prob(f/  =  O|0,fi>) 


(7.8b) 


[1  -  /?(0)][l  -  P*(g)]  +  (1  -  c)P(g)(l  -  m) 
[l-/?(0)][l-P*(0)|+(l-c)P(0) 


Given  that  m  9^  0,  both  (7.8a)  and  (7.8b)  are  constant  with  respect  to  6  if  either  P(0)=O 
or  both  i?(0)  and  P*(0)  are  constant  with  respect  to  6  (the  degenerate  case  in  which  the 
item  conveys  no  information  whatsoever  about  0).  d 


7.2.4  Nominal  Category  Models 

IRT  models  for  multiple-category  items  give  probabilities  of  item  responses  as 
conditionally  independent  functions  of  6  (e.g.,  Bock,  1972;  Samejima,  1979;  Thissen  & 
Steinberg,  1986).  These  models  have  sometimes  been  used  for  data  with  intentional 
omissions,  with  an  omit  treated  as  one  more  possible  response  to  a  multiple-choice  item. 
Lord  (1983)  expresses  reservations  about  this  practice,  “...  since  it  treats  probability  of 
omitting  as  dependent  only  on  the  examinee’s  ability,  whereas  it  actually  depends  on  a 
dimension  of  temperament.  It  seems  likely  that  local  unidimensional  independence  may  not 
hold”  (p.  477).  The  following  analysis  makes  Lord’s  concerns  more  explicit. 

The  character  of  this  approach  regarding  omission  are  seen  most  easily  in  the  case 
in  which  all  incorrect  overt  responses  are  denoted  into  a  single  category.  Suppose  the 
following  conditions  hold: 
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(viii)  A  SMURFLI(2)  IRT  model  in  6  governs  U; 

(ix)  (j)  characterizes  an  examinee’s  tendency  to  omit,  via  and 

(x)  (uj ,  rrij  j  are  conditionally  independent  over  items  given  ( 6,  ^) ;  that  is, 

Prob(t/  =  u,M  =  m\d,  =  nProb(f/y  =  Uj,Mj  =  rnj\6, 

j 

which  in  turn  implies 

Prob(y  =  v\d,  (p)  =  nProb(v}  =  vj\e,  (p).  (7.9) 

j 


Recalling  that  the  values  0, 1,  and  *  of  v  indicate  observed  wrong,  observed  right, 
and  omit,  we  obtain  the  multiple-category  model  probabilities  as  follows: 

fl(Vj  =  o)  =  Prob(t/j  =0,Mj=  1|9) 

=  /  fe(Us = =  pj  =  o)p(0i 


and 


/;(y,=l)  =  Prob(f/^=l,M^-  =  l|0) 

-\fe{Uj  =  ^)g^[Mj  =  =  \)p{(p\e)d^, 

fe(yj  =  *)  =  Prob(t/y  =  Q,Mj  =  0|6)  +  Prob(l7y  =  \,Mj  = 
=  \fe[Uj  =Q)g^[Mj=Q\Uj=6)p{(p\e)d^ 
+\fe(Uj  =  =  0|f/,.  =  l]p{<p\9)d<p. 


(7.10a) 


(7.10b) 


(7.10c) 


Teieorem  7.2.5.  Under  Conditions  (viii)-(x)  above,  the  multiple-category  response 
functions  in  6  for  v,  or  (7.10a)-(7.10c),  exhibit  conditional  independence  over  items  only 
if  all  examinees  with  each  given  6  have  the  same  value  of  <p. 

Proof.  The  conditional  probability  of  Vj  given  6,  marginal  with  respect  to  0,  is 

Prob(y,-  =  v^i  e) = J  Prob(y,-  =  vfd,  ^)p{^\  e)d(p. 

If  local  independence  given  6  is  to  hold  across  items,  it  follows  that 

Prob(y  =  vl  0)  =  n  J  Prob(Vy  =  vf  6,  (p)p{(p\  6)d(p.  (7. 1 1) 

j 

But  marginalizing  over  <p  in  Prob(y  =  vl  9,  co)  to  obtain  Prob(y  =  vl  9)  yields 

Prob(y  =  vl  0)  =  I  Prob(y  =  vl  9,  (p)p{(p\  9)3^ 

=  jnProbfy^  =  Vj\ 9,  (p)p{<p\  9)9(p. 
j 


(7.12) 
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Expressions  (7. 1 1)  and  (7. 12)  differ  only  in  that  the  order  of  integration  and  multiplication 
are  interchanged,  and  are  equal  in  general  only  if  (j)  is  constant  for  each  given  value  of  S.D 

8.0  Examinee  Choice  of  Items 

The  instructions  to  one  section  of  the  College  Entrance  Examination  Board’s  1905 
German  test  read,  “  Answer  only  one  of  the  following  six  questions,”  and  the  1905  Botony 
exam  included  a  section  of  ten  items,  of  which  students  had  to  answer  any  seven  (Wainer 
&  Thissen,  1994,  p.  161).  In  this  section  we  consider  tests  that  (a)  incorporate  choice  in 
such  a  manner  and  (b)  posit  a  SMURFLI(2)  IRT  model.  These  tests  present  problems  of 
inference  with  respect  to  the  missing  responses  to  non-chosen  items  that  are  formally 
identical  to  those  associated  with  intentional  omissions:  The  students  examines  all  items, 
consider  their  chances  of  success  on  each,  and  chose  which  to  answer  in  a  manner  that  may 
depend  on  their  actual  chances  of  success  through  6.  The  only  difference  is  the  constraint 
on  possible  missingness  patterns,  which  is  irrelevant  to  Rubin’s  ignorability  conditions. 
Section  8.1  recasts  the  results  of  Section  7.1  in  the  context  of  examinee  choice  of  items, 
concluding  that  if  an  underlying  SMURFLK^)  IRT  model  is  to  be  maintained,  it  is 
necessary  to  model  missingness  and  response  jointly.  Section  8.2  discusses  how  the 
generally-nonignorable  missingness  can  be  modeled  in  the  IRT  context. 

8.1  On  the  Ignorability  of  Responses  to  Non-Chosen  Items 

As  with  intentional  omits,  motivated  examinees  facing  choice  situations  attempt  to 
make  choices  and  responses  that  maximize  the  score  they  expect  to  receive.  We  assume  a 
SMURFLI(2)  IRT  model,  and  consider  test  scores  in  the  form  of  counts  of  correct  response 
R{v).  Since  a  correct  response  to  an  item  gives  a  higher  test  score  than  an  incorrect 
response,  examinees  who  want  to  obtain  high  scores  will  chose  an  allowable  missingness 
pattern  in  which  they  can  make  responses  they  think  are  likely  to  be  correct.  As  with 
omits,  examinees’  perceived  probabilities  of  correct  response  are  not  necessarily  the  same 
as  those  given  by  the  IRT  model,  and  examinees  may  differ  in  the  accuracy  of  their 
estimates.  Such  characteristics  of  an  examinee  constitute  the  choice  parameter  ^ .  The 
intuition,  supported  by  studies  such  as  that  of  Wang,  Wainer,  and  Thissen  (1993),  again 
suggests  that  examinees  with  high  abilities  are  typically  better  at  projecting  their  expected 
scores  under  different  choice  patterns  and  responding  to  item  configurations  that  lead  them 
to  higher  scores. 
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Theorem  8.1.1.  For  a  given  fh  and  the  missingness  process  induced  by  examinee 
choice  is  MAR  only  if  this  missingness  pattern  is  equally  likely  for  all  values  of 

Proof.  This  follows  immediately  from  the  definition  of  MAR.  □ 

Remark.  If  examinees  are  more  likely  to  avoid  choosing  items  when  they  think  their 
responses  would  lead  to  lower  scores,  MAR  implies  the  untenable  belief  of  independence 
of  their  perceptions  of  correctness  from  the  probabilities  given  by  the  IRT  model. 

Assuming  a  monotone  unidimensional  IRT  model  and  considering  a  given 
missingness  pattern  fh  induced  by  examinee  choice,  the  NS  conditions  for  ignorability  in 
Theorems  2.2  and  2.3  say  that  for  the  probability  of  m  is  the  same  for  all  6  even  though 
the  expected  number-right  score  of  the  non-chosen  items,  or  £'[R(i7nus)|0},  is  increasing 

in  6  (by  Theorem  7.1.1).  That  is,  high-ability  examinees  are  just  as  likely  to  produce  this 
choice  pattern  as  low-ability  examinees  (given  Mobs)>  even  though  the  associated  responses 

have  a  greater  expected  contribution  to  their  total  test  score  and  the  maximum  ‘chosen  AT 
would  be  higher.  A  counter-intuitive  result  is  easiest  to  see  when  n=2: 

Theorem  8. 1 .2.  Suppose  a  SMURFLF^)  IRT  model  holds  in  a  domain  consisting  of  two 
items  (7=1,2),  and  the  missingness  induced  by  “answer  any  one  of  two”  format  is  ignorable 
under  likelihood  or  Bayesian  inference.  Suppose  further  that  Ui=0.  Then  (a)  the 
conditional  probability  that  an  examinee  will  choose  to  present  this  response  rather  than  the 
response  to  U2  must  be  constant  with  respect  to  0,  even  though  (b)  Prob(  C/2=l)  is 
increasing  with  respect  to  9 . 

Proof.  The  NS  condition  for  ignorability  under  likelihood  inference  (2.8)  requires  that  for 
each  value  of  ^ ,  the  value  of  (ml  ,  M^bs  )\fh.  Mobs  ’  (th®  probability  that 

missingness  pattern  fh  will  be  observed  given  Mobs>  and  0)  is  constant  with  respect  to 
9.  The  theorem  addresses  the  case  in  which  n=2,  m  =  (1,0),  and  v  =  (0,*) .  Because  the 
IRT  model  is  strictly  monotonic,  the  probability  of  the  missing  response  being  correct  is 
increasing  in  9 .  The  same  argument  holds  for  ignorability  under  Bayesian  inference, 
since,  by  Theorem  2.3,  it  is  required  that  the  expectation  of  (2.8)  over  <t>,  or  (2.9),  is 
constant  with  respect  to  9.  □ 
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8.2  Modeling  Examinee  Choice  Within  the  IRT  Framework 

This  section  addresses  the  information  in  examinee-choice  test  responses  about  9. 
Because  missingness  due  to  examinee  choice  is  not  generally  ignorable,  appropriate 
inference  under  the  IRT  framework  requires  working  from  v  rather  than  simply  • 

Wainer  and  Thissen  (1994)  point  out  that  as  with  omits,  the  correct  model  cannot  be 
ascertained  from  observations  of  v  alone.  Supplemental  data  collections  of  {u,m),  as  was 
done  in  Wang,  Wainer,  and  Thissen  (1993)  suffice  to  build  a  model.  This  study  asked 
examinees  to  respond  to  both  of  two  items  and  to  also  indicate  which  one  they  would  have 
chosen  if  only  one  were  to  have  been  scored.  From  such  data  it  is  possible  to  estimate  both 
feiu),  the  IRT  model,  and  g^{m\u),  the  choice  model.  Note  that  g  must  assign  a 

probability  of  zero  to  any  m  other  than  the  («-choose-A0=n!/[N!(n-A0!]  patterns  consistent 
with  “answer  any  N  of  «”  format. 

Theorem  8.2. 1 .  The  likelihood  function  for  induced  6  under  an  LI  IRT  model  when 
responses  are  observed  under  the  “answer  any  i'/ of  n”  format  is  the  weighted  average  of  all 
possible  response  vectors  u  consistent  with  with  the  weights  associated  with  each 

pattern  of  non-observed  responses  being  the  probabilities  of  the  observed  choice  pattern 
given  Mobs  the  non-observed  responses. 

Proof.  This  is  a  specialization  of  (2.2)  to  the  context  of  choice.  The  likelihood  function 
induced  by  vis  •L(0,^lv)  =  5g0j/0(M^5,Mobs)&0(^l*^mis’^obs)^^mis*  ^ 


8.2.1  Special  Cases 

This  section  gives  ^^(^iMmis’^obs)  for  some  special  cases  of  choice  behavior  with 
SMIJRFLI(2)  items  and  “answer  any  N  of  n”  format. 


Special  case  #1 :  An  MCAR  choice  mechanism.  If  an  examinee’s  choice  of  items  is 
independent  of  9,  then  g  assigns  equal  probability  to  all  m  consistent  with  the  instmctions 
and  zero  to  all  others: 


g0{m\u)  = 


K 


-1 


0 


if  ^ 

j 

Otherwise, 


(8.1) 


where  .fir=(n-choose-AO.  The  likelihood  under  ignorability  can  be  expressed  as  the  equally- 
weighted  average  of  all  complete  response  patterns  consistent  with  M^bs  • 

L(^9\v^  —  6q  j  f  0  (mjjjjj  ,  M^bs  ’ 


(8.2) 
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which  is  proportional  to  L  (01  Mobs)-  this  situation,  is  representative  of  an 
examinee’s  typical  performance,  despite  the  choice  format. 


Special  case  #2:  Fully  efficient  choice,  unique  maximum  score.  Suppose  an  examinee 
has  perfect  knowledge  of  u  and  chooses  m  in  such  a  way  as  to  maximize  the  test  score, 
r(v).  If  the  highest  possible  score  is  obtained  by  a  unique  pattern  v*  =  then 


g 0  (ml  Mjjjjj ,  : 


_’l  ifr(v)=r(v*)andX/M;=A^ 


0  otherwise. 


By  Theorem  8.2.1,  L{6,  <j)\v)  in  this  case  is  an  equally  weighted  average  of  all 

/«(«  mis  >  ^obs  )  possibilities  in  which  any  other  choice  pattern  would  have  yielded  a  lower 

score  than  j.  In  this  case  and  the  next,  MqJjj  represents  maximal  performance. 


Special  case  #3:  Fully  efficient  choice,  nonunique  maximum  score.  Suppose  that  an 
examinee  has  perfect  knowledge  of  u  and  chooses  m  in  such  a  way  as  to  maximize  the  test 
score,  T{v).  If  v*  =  (M*bs,m*)  is  one  of  patterns  that  yields  the  highest  score  from  u 

and  the  examinee  chooses  one  of  these  patterns  at  random,  then 


^0(^lMjjjjs,MQ[,5)  • 


if  T{v)  =  r(v* )  and  Z  'Wy  = 


0  otherwise. 


Special  case  #4:  Partially  efficient  choice.  Suppose  that  an  examinee  seeks  to  choose  m 
in  such  a  way  as  to  maximize  the  test  score,  r(v) ,  but  has  imperfect  knowledge  of  u .  One 
specification  of  g  that  takes  this  intention  into  account  while  softening  its  effect  is  a  linear 
combination  of  the  ignorability  weights  (8.1)  and  the  efficient-choice  weights,  namely 
(8.3)  or  (8.4)  as  appropriate.  For  a  non-unique  maximum  score,  for  example, 

f)H~^  -t-  (1  -  (f)K~^  if  T{v)  =  t(v*)  and  J_,mj  =  N 

j 

{l-(t>)K-^  ifT{v)<T[v)mdY,mj  =  N  (8.5) 

0  otherwise, 

with  0  <  0  <  1 .  There  are  K  patterns  with  N  observed  responses,  H  of  which  yield  the 

maximal  score  and  K-H  of  which  yield  lower  scores;  it  can  be  verified  that  the  probabilities 
in  (8.5)  sum  to  1  over  the  K  allowable  patterns.  Efficient-choice  weights  that  presume  u^^^^ 
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represents  “maximum  performance”  are  thus  mixed  with  ignorability  weights  that  presume 
it  represents  “typical  performance”. 

Remark.  Special  Case  #4  might  apply  to  a  not-uncommon  practice  with  IRT:  the 
administration  of  “customized  tests,”  in  which  a  school  or  district  selects  items  from  an 
item  bank  for  which  item  parameters  have  been  estimated  from  a  non-targeted  reference 
population  of  examinees  (Yen,  Green,  &  Burket,  1987).  The  items  of  a  customized  test  are 
chosen  because  they  are  more  relevant  to  the  local  curriculum  than  the  non-chosen  items,  so 
the  administrator  is  effectively  acting  as  an  agent  of  the  local  students  in  a  choice  exam.  If 
this  choice  is  ignored,  over-estimates  of  d  result  in  a  manner  shown  in  following  example. 

8.2.2  A  Numerical  Example 

This  example  concerns  “answer  any  of  n”  format  with  iid  SMURFLE^)  items 
following  an  IRT  model  of  the  form  /©(“y)  =  exp(0)/[l  -i-  exp(0)],  with 

p{d)  =  A^(0,2)  and  number-right  scoring.  In  particular,  we  consider  n=5  and  iV=2,  for 
V  =  (1,1,*,*,*)  and  V  =  (1,0,*,*,*)  under  three  alternatives  to  modeling  the  choice  process. 

The  first  panel  in  Figure  4  gives  the  likelihood  functions  for  the  eight  possible  Us 
consistent  with  v  =  (1, 1,  *,  *,  *) .  L(  01  v)  is  a  weighted  average  of  these,  with  weights 
depending  on  how  the  choice  process  is  modeled.  Table  1  gives  weights  g^{m\u)  under 

ignorability,  efficient  choice  (8.4),  and  a  9:1  mixture  of  efficient-choice  and  ignorability 

weights  via  (8.5).  The  ignorability  weights  are  all  (5-choose-2)'l,  or  .10.  For  efficient 
choice,  g0{m\u)  depends  on  the  number  of  different  with  a  score  of  2  that  could  be 

made  from  the  complete  pattern,  or  H(u).  When  u  =  (1,1, 0,1,1),  for  example,  H(m)= 
(4-choose-2)=6,  so  H(u)~^=.\67  appears  in  the  efficient-choice  column  for  this  pattern. 
The  resulting  L(0l  v)s  appear  in  the  second  panel  of  Figure  4.  The  likelihood  under 

efficient  choice  flattens  out  above  0,  whereas  the  ignorability  likelihood  continues  to  rise 
rapidly;  examinees  above  zero  are  likely  to  obtain  at  least  two  I’s  in  (7,  so  this 
observational  scheme  provides  little  evidence  for  distinguishing  among  them.  The 
distributions  p[6\v)  appear  in  the  final  panel  of  Figure  4,  and  the  posterior  mean  and 

standard  deviation  for  0  in  each  case  appear  in  Table  1 .  The  flattened  likelihood  for  the 
efficient-choice  condition  leads  to  a  much  lower  posterior  mean  and  a  somewhat  wider 
posterior  standard  deviation  than  are  obtained  under  ignorability. 

PTable  1  &  Figure  4  about  here] 
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Similar  results  for  v  =  ( l,  0,  *,  *,  *)  appear  in  Table  2  and  Figure  5.  Efficient-choice 
again  yields  a  much  lower  posterior  mean  than  the  ignorability  condition,  but  now, 
interestingly,  it  leads  to  a  smaller  posterior  standard  deviation.  This  is  because  under 
efficient  choice,  observing  with  only  one  correct  response  means  that  the  three 

missing  responses  must  all  be  zero;  i.e.,  under  efficient-choice,  observing  v  =  (1,0,*,*,*) 
is  equivalent  to  observing  u  =  (1,0, 0,0,0). 

[Table  2  &  Figure  5  about  here] 

In  this  example,  examinee-choice  makes  inference  about  6  less  efficient  for  higher 
values,  where  the  question  is  about  the  average  value  of  uj&,  but  more  efficient  for  very  low 
values  where  the  question  is  more  one  of  the  existence  of  any  ujs  that  are  1.  Departure 
from  efficient-choice  behavior  erodes  this  potential  advantage.  Results  such  as  these  led 
both  Wainer  and  Thissen  (1994)  and  Bradlow  and  Thomas  (in  review)  to  conclude  that 
examinee-choice  and  IRT-based  inference  are  an  unattractive  combination.  IRT-based 
inference  works  well  when  one  is  interested  in  typical  performance  in  a  domain  of 
exchangeable  tasks,  to  be  characterized  by  a  single  ability  variable  6 ;  examinee-choice  is 
attractive  when  it  is  not  typical  performance  in  a  domain  of  exchangeable  tasks,  but  the 
maximal  performance  in  special  circumstances  best  known  to  the  examinee.  An  alternative 
modeling  approach  for  such  circumstances  requires  the  specification  of  a  common 
framework  of  evaluation  into  which  performances  in  different  contexts  can  be  mapped 
(e.g.,  the  Advanced  Placement  Studio  Art  portfolio  assessment,  in  which  students  have 
considerable  choice  as  to  media,  style,  and  subjects;  see  Myford  &  Mislevy,  1995). 

9 . 0  Summary 

In  practical  applications  of  item  response  theory  (IRT),  there  are  several  reasons 
that  item  responses  may  not  be  observed  from  all  examinees  to  all  test  items.  We  used 
Rubin’s  (1976)  theorems  to  determine  whether  ignorability  holds  under  direct  hkelihood 
and  Bayesian  inference  about  examinee  parameters  6  under  six  common  types  of 
missingness  in  IRT,  with  item  parameters  known.  Ignoring  the  missingness  process  under 
direct  likelihood  inference  means  using  a  pseudo-likelihood  that  includes  terms  for  only  the 
responses  that  were  observed,  without  regard  for  the  processes  by  which  they  came  to  be 
observed.  The  resulting  inferences  are  appropriate  if  the  pseudo-likelihood  is  proportional 
to  the  correct  likelihood  that  does  account  for  the  missingness  process.  The  missingness  is 
ignorable  with  respect  to  Bayesian  inference  if  the  correct  posterior  is  proportional  to  the 
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product  of  the  pseudo-likelihood  and  an  appropriate  prior  distribution.  Our  findings  are 
summarized  below.  Table  3  highlights  the  results  on  ignorability. 

[Insert  Table  3  about  here] 

Alternate  test  forms.  When  an  examinee  is  assigned  one  of  several  alternative  test 
forms  by  a  random  process  such  as  a  coin  flip  or  a  spiraling  scheme,  the  process  that 
renders  missing  the  responses  to  items  on  the  forms  not  presented  is  ignorable  under  both 
likelihood  and  Bayesian  inference  for  6 . 

Targeted  testing.  When  covariates  such  as  educational  or  demographic  status  are 
used  to  assign  an  examinee  one  of  several  tests  that  differ  in  their  measurement  properties, 
the  resulting  missingness  on  forms  not  given  is  ignorable  under  direct  likelihood  inference 
for  6 ,  but  not  under  Bayesian  inference  unless  the  prior  information  about  examinees  that 
led  to  differential  assignments  is  conditioned  on. 

Adaptive  testing.  The  missingness  caused  by  the  selection  of  items  to  present  to  an 
examinee  based  on  observed  responses  to  previous  items  is  ignorable  under  both  direct 
likelihood  and  Bayesian  inference.  It  should  be  noted  that  ignorability  under  direct 
likelihood  inference  means  that  the  correct  points  are  identified  as  MLEs  of  6 ,  but  the  usual 
MLE  properties  under  sampling-distribution  inference  need  not  hold  because  the 
probabilities  of  missingness  patterns  depend  on  the  values  of  observed  responses. 

Not-reached  items.  When  some  examinees  do  not  interact  with  the  last  items  on  a 
nearly  unspeeded  test,  the  not-reached  process  is  ignorable  with  respect  to  direct  likelihood 
inference  about  6.  This  missingness  process  is  not  ignorable  under  Bayesian  inference 
unless  speed  and  ability  are  independent. 

Omitted  items.  When  examinees  are  presented  items,  appraise  their  content,  and 
decide  for  their  own  reasons  not  to  respond,  the  missingness  is  not  generally  ignorable. 
Inferences  must  be  drawn  from  a  full  model  for  the  joint  distribution  of  missingness  and 
item  response,  as  sketched  in  Lord  (1983).  Under  the  assumption  that  examinees  are 
perfect  judges  of  their  chances  of  responding  correctly,  and  omit  only  if  it  is  in  accordance 
with  the  strategy  that  maximizes  their  expected  score.  Lord’s  (1974)  treatment  of  omits  as 
fractionally  correct  under  a  standard  IRT  model  can  be  justified  as  providing  the  expectation 
of  a  conditional  term  in  the  full  likelihood.  This  procedure  is  readily  incorporated  into 
standard  complete-data  IRT  algorithms  and  avoids  having  to  specify  the  full  likelihood,  but 
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foregoes  information  about  examinee  and  item  parameters  conveyed  by  the  observed 
pattern  of  missingness. 

Examinee  choice  of  items.  Insofar  as  ignorability  conditions  are  concerned, 
examinee  choice  is  equivalent  to  intentional  omission.  Choice  is  not  generally  ignorable, 
and  treating  it  as  such  typically  overestimates  the  information  about  9.  This  includes 
choice  made  by  a  test  administrator  on  behalf  of  examinees — e.g.,  school  officials  pick  out 
a  “customized  test”  aligned  to  their  curriculum,  but  using  item  parameters  estimated  from  a 
non-targeted  reference  population.  It  is  possible,  with  supplemental  data  from 
experiments,  to  estimate  the  typically  lower  and  more  diffuse  likelihoods  for  9  arising 
from  choice  in  IRT  domains,  but  the  administration  scheme  of  examinee  choice  of  tasks  is 
ill-suited  to  domain-referenced  IRT  inference.  Conditional  evaluation  of  choice 
performances  within  a  common  framework  of  evaluation  seems  better  suited  to  tasks  that 
evidence  targeted  skills  only  given  ancillary  skills. 
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Table  1 

Weighting  Factors  for  Likelihoods  when  v  =  (1, 1,  *,  *,  *) 


8(1,  {m\u) 

Ignorability 

Fuliy-efficient  Choice 

Partially-Efficient 
Choice,  0  =.9 

11000 

.10 

1.00 

.91 

11001 

.10 

.33 

.31 

11010 

.10 

.33 

.31 

non 

.10 

.17 

.16 

11100 

.10 

.33 

.31 

11101 

.10 

.17 

.16 

lino 

.10 

A1 

.16 

11111 

.10 

.10 

.10 

E{e\v) 

1.32 

.52 

.82 

Var(d\v) 

1.56 

1.72 

1.71 

Table  2 

Weighting  Factors  for  Likelihoods  when  V  =  (1,0,*,*,*) 


g(^{m\u) 

M[=(l,0,Mn,is)] 

Ignorability 

Fully-efficient  Choice 

Partially-Efficient 
Choice,  0  =.  9 

10000 

.10 

1 

.91 

10001 

.10 

0 

.01 

10010 

.10 

0 

.01 

10011 

.10 

0 

.01 

10100 

.10 

0 

.01 

10101 

.10 

0 

.01 

10110 

.10 

0 

.01 

10111 

.10 

0 

.01 

£(0lv) 

-.52 

-1.84 

-1.31 

Var{e\v) 

1.50 

1.20 

1.48 

Table  3 

Ignorability  Results  for  Estimating  6  Given  Item  Parameters 


Type  of  Inference 


Type  of  Missingness 

Direct  Likelihood 

Bayesian 

Alternate  Test  Forms 

Yes 

Yes 

Targeted  Test  Forms 

Yes 

Yes,  conditional  on  examinee 
covariates 

Adaptive  Testing 

Yes 

Yes 

Not-Reached  Items 

Yes 

No,  unless  speed  and  ability 
are  independent 

Intentional  Omits 

No 

No 

Examinee  Choice 

No 

No 

Figure  1 

Response  Curves  for  Correct  Response  to  Two 
Dichotomous  Rasch-Model  Items 
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Figure  3 

Conditional  Probabilities  of  Possible  Paths  to  Responses 
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Likelihood  functions  for  6  given  ms 
that  are  consistent  with  u 


9 

Likelihood  functions  for  9  given  v  under  different 
assumptions  about  the  choice  process 


Posterior  distributions  for  6  given  v  under  different 
assumptions  about  the  choice  process 


Figure  4 

Inference  about  0from  \^(1,1, *,*,*) 
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